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Abstract 

. We study a model of learning on social networks in dynamic environments, describing a group of 

' agents who are each trying to estimate an underlying state that varies over time, given access to weak 

signals and the estimates of their social network neighbors. 
^ ■ We study three models of agent behavior. In the fixed response model, agents use a fixed linear 

' combination to incorporate information from their peers into their own estimate. This can be thought of 

as an extension of the DeGroot model to a dynamic setting. In the best response model, players calculate 
minimum variance linear estimators of the underlying state. 

We show that regardless of the initial configuration, fixed response dynamics converge to a steady 
state, and that the same holds for best response on the complete graph. We show that best response 
dynamics can, in the long term, lead to estimators with higher variance than is achievable using well 
chosen fixed responses. 

^ . The penultimate prediction model is an elaboration of the best response model. While this model 

^ ' only slightly complicates the computations required of the agents, we show that in some cases it greatly 

increases the efficiency of learning, and on complete graphs is in fact optimal, in a strong sense. 
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The past three decades have witnessed an immense effort by the computer science and economics communities 
to model and understand people's behavior on social networks [17]. A particular goal has been the study of 
how people share information and learn from each other; learning from peers has been repeatedly shown to 
be a driving force of many economic and social processes (cf. [10, 8, 20, 9]). 

1.1 Classical approaches and results 
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, Early work by DcGroot [11] considered a set of agents, connected by a social network, that each have a prior 

V-^ ' belief: a distribution over the possible values of an underlying state of the world - say the market value of 

some company. The agents iterativcly observe their neighbors' beliefs and update their own by averaging the 
distributions of their neighbors. Since DeGroot, a plethora of models for social learning have been proposed 
and studied. 

DcGroot's simple averaging of neighbors' beliefs may seem naive and arbitrary; economists often opt for 
rational models instead. In rational models the agents update their belief not by a fixed rule, but in an 
attempt to maximize a utility function. It is often assumed that agents are Bayesian: they assume some 
prior distribution on the underlying state and on other agents' behavior, have access to some observations, 
and maximize the expected value of their utility, using Bayes' Law. Bayesian social learning has a wide 
literature, with noted work by Aumann [4] and the related common knowledge work (cf. [14]), as well as 
McKelvey and Page [21], Parikh and Krasucki [24], Bala and Goyal [6], Gale and Kariv [13], and many 
others. 
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Aumann [4] and Geanakoplos [15] show that a group of Bayesian agents, who each have an initial estimate 
of an underlying state, and repeatedly announce their estimate (in particular, expected value) of this state, 
will eventually converge to the same estimate. McKelvcy and Page [21] extend this result to processes in 
which "survey results", rather than all the estimates, are repeatedly shared. The social network in these 
models is the complete network; indeed, it seems that non-trivial dynamics and results are achieved already 
for this (seemingly simple) topology. Aaronson [1] studies the complexity of the computations required of 
the agents, again with highly non-trivial results. 

1.2 Rationality and bounded rationality 

The term rational in economic theory refers to any behavior that maximizes (or even attempts to maximize) 
some utility function. This is in contrast to, for example, behavior that is heuristic or fixed. Bayesian 
rationality optimizes in a probabilistic framework that includes a prior and observations, and is, as mentioned 
above, a commonly used paradigm. 

The disadvantage of fully rational, Bayesian models is that the calculations required of the agents can 
very quickly become intractable, making their applicability to real-world settings questionable; this tension 
between rationality and tractability is an old recurring theme in behavioral economics models (cf. [25]). 

A solution often advocated is bounded rationality. Agents still act optimally in bounded rationality 
models, but only optimize with respect to a restricted set of choices. This usually simplifies the optimization 
problem that needs to be solved. For example, agents may be required to disregard some of their available 
information or be restricted in the manner that they calculate their strategy. In addition to serving the goal 
of more realistically modeling agents, a usual added benefit of bounded rationality is that the analysis of the 
model becomes easier. We too follow this course. 

A standard assumption in this literature is that "actions speak louder than words" (cf. Smith and 
S0renscn [26]); agents do not participate in a communication protocol intended to optimize the exchange of 
information, but rather make inferences about each others' private information by observing actions. For 
example, by observing the price at which a person bids for a stock one may learn her estimate for the future 
price, but yet not learn all of the information which she used to arrive at this estimate. 

1.3 Informal statement of the model 

We consider a model where the underlying state S is not a constant number - as it is in all of the above 
mentioned models - but changes with time, as prices and other economic quantities tend to do. In particular 
we assume that the state S = S{t) performs a random walk; S{0) is picked from some distribution, and at 
each iteration an i.i.d. random variable is added to it. 

The process commences with each agent having some estimator of S{0). We make only very weak 
assumptions about the joint distribution of these estimators. Then, at each discrete time period t, each 
agent receives an independent (and identical over time) measurement of S{t), and uses it to update its 
estimator. Also available to it are the previous estimates of its neighbors on a social network. Thus social 
network neighbors share their beliefs (or rather, observe each others' actions), and information propagates 
through the network. 

While conceivably agents could optimally use all the information available to them to estimate the under- 
lying state, it appears that such calculations are extremely complex. Instead, we explore bounded rationality 
dynamics, assuming that agents are restricted to calculating linear combinations of their observations. We 
note that if the random walk and the measurements are taken to be Gaussian, then the minimum variance 
unbiased linear estimator (MVULE) is also the maximum likelihood estimator. A Gaussian random walk is 
a good first-order approximation for many Economic processes (cf. classical work by Bachelicr [5]). 

For the first part of the paper, we also require that these linear combinations only involve the agents' 
neighbors' estimates from the previous time period (and not earlier), as well as their new measurement. In 
the last model we slightly relax this requirement. 

We consider three models. In the fixed response model each agent, at each time period, estimates the 
underlying state by a fixed linear combination of its new measurement and the estimates of its neighbors in 
the previous period. This is a straightforward extension of the DcGroot model to our setting. 
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In the best response model, at each iteration, each agent calculates the MVULE of the underlying state, 
based on its peers' estimate from the previous round, together with its new measurement. We assume here 
that at each iteration the agents know the covariance matrix of their estimators. While this may seem like 
a strong assumption, we note that, under some elaboration of our model, this covariance matrix may be 
estimated by observing the process for some number of rounds before updating one's estimator. Furthermore, 
it seems that assumptions in this spirit - and often much stronger assumptions - are necessary in order for 
agents to perform any kind of optimization. For example, it is not rare in the literature of social Bayesian 
learning to assume that the agents know the structure of the entire social network graph (e.g., [13, 24, 2]). 

Finally, we introduce the penultimate prediction model, which is a simple extension of the best response 
model, additionally allowing the agents to remember exactly one value from one round to the next. While 
only slightly increasing the computational requirements on the agents, this model exhibits a sharp increase 
in learning efficiency. 

1.4 Informal statement of results 

While our long term goal is to understand this process on general social network graphs, we focus in this 
paper on the complete network, which already exhibits mathematical richness. 

We consider the system to be in a steady state when the covariance matrix of the agents' estimators 
is constant. On general graphs we show that fixed response dynamics converge to a steady state. On the 
complete graph we show that best response dynamics also converge to a steady state. Both of these results 
hold regardless of the initial conditions (i.e., the agents' estimators at time t = 0). 

We show that the steady state of best response dynamics is not necessarily optimal; there exist fixed 
response dynamics in which the agents converge to estimators which all have lower variance then the esti- 
mators of the steady state of best response dynamics. This shows that every agent can do better than the 
result of best response by following a socially optimal rule; thus a certain price of anarchy is to be paid when 
agents choose the action that maximizes their short term gain. 

Finally, we show that in the penultimate prediction model, for the complete graph, the agents learn 
estimators which are the optimal (in the minimum variance sense) amongst all linear estimators, and thus 
outperform those of fixed and best response dynamics. 

We define a notion of "socially asymptotic learning": A model has this property when the variance of 
the agents' steady-state estimators tends towards the information-theoretical optimum with the number of 
agents. We show that the penultimate prediction model exhibits socially asymptotic learning on the complete 
graph, while best response and fixed response dynamics fail to do the same. 

2 Previous work 

Our model is an elaboration of models studied by DeMarzo, Vayanos and Zwiebel [12], as well as Mossel and 
Tamuz [23, 22]. There, the state iS" is a fixed number picked at time t = 0, and each agent receives a single 
measurement of it. The process thereafter is deterministic, with each agent, at each iteration, recalculating 
its estimate of S based on its observation of its neighbors' estimates. 

In [22] it is shown that if the agents calculate the minimum variance unbiased linear estimator (MVULE) 
at every turn (remembering all of their observations) then all the agents converge to the optimal estimator 
of S, i.e. the average of the original measurements. Furthermore, this happens in time that is at most n ■ d, 
where d is the diameter of the graph. 

When agents calculate estimates that are only based on their observations from the previous round, then 
they do not necessarily converge to the optimal estimator [23]. In fact, it is not known whether they converge 
at all. 

A similar model is studied by Jadbabaie, Sandroni and Tahbaz-Salehi [18]. They explore a bounded 
rationality setting in which agents receive new signals at each iteration. An agent's private signals may be 
informative only when combined with those of other agents, and yet their model achieves efficient learning. 

Our model is a special case of a model studied by Acemoglu, Ncdic, and Ozdaglar [3]. They extend 
these models by allowing the state to change from period to period. They don't require the change in the 
state to be i.i.d, but only to have zero mean and be independent in time. Their agents also receive a new. 
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independent measurement of the state at every period, which again need not be identically distributed. They 
focus on a different regime than the one we study; their main result is a proof of convergence in the case 
that the variations in the state diminish with time, with variance tending to zero. 

In our model the change in the underlying state has constant variance, as does the agents' measurement 
noise. This allows us to explore steady states, in which the covariancc matrix of the agents' estimators does 
not change from iteration to iteration. 

Our model is non-trivial already for a single agent, although here a complete solution is simple, and can 
be calculated using tools developed for the analysis of Kalman filters [19]. 

3 Notation, formal models, and results 

Let [n] = {1, 2, . . . , n} be a set of agents. Let G = {[n],E) be a directed graph representing the agents' social 
network. Wc denote by di = j) G E} the neighbors of i, and assume that always i € di. 

We consider discrete time periods t G {0, 1, . . .}. The underlying state of the world at time t, S{t), is 
defined as follows. 5'(0) is a real random variable with arbitrary distribution, and for i > 



where E = 0, Var = cr^, and cr is a parameter of the model. The random variables X(0), . . . 

are independent. Hence the underlying state S{t) performs a random walk with zero mean and standard 
deviation a. 

At time i = each agent i receives 1^(0), an estimator of S{Q). The only assumptions wc make on their 
joint distribution is that E [5^^(0)15(0)] = S'(O), i.e. the estimators are unbiased, and that Var [^^(O) — '5'(0)] 
is finite for all i. 

At each subsequent period t > 0, each agent i receives Mi{t), an independent measurement of S'(t), 
defined by 



where E [-Di(t)] = 0, Var [Di(i)] = r^^, and the Ti's are parameters of the model. Hence Di{t) is the measure- 
ment error of agent i at time t. Again, the random variables Di{t) arc independent. 

At each period t > 0, each agent i calculates Yi{t), agent I's estimate of S{t), using the information 
available to it. Precisely what information is available varies by the model (and is defined below), but in all 
cases Yi{t) is a (deterministic) convex linear combination of agent i's measurements up to and including time 
t, {Mi{t')\t' < t}, as well as the previous estimates of its social network neighbors, {Yj{t')\t' < t,j G di}. 
Additionally, in the penultimate prediction model, at each round t each agent computes a value Ri(t), and 
at round t + 1 uses this value to compute Ri{t + 1) and Yi{t + 1). Like Yi{t), Ri{t) is also a convex linear 
combination of the same random variables. 

In general, we shall assume that the agents are interested in minimizing the expected squared error of 
their estimators, E [(^^(t) — S'(t))^] ; assuming Yi{t) is unbiased (i.e., E [yi(i)|S'(t)] = S{t)) this is equivalent 
to minimizing Var [Yi{t) — S{t)], which we refer to as the "variance of the estimator Yi{t)." We shall assume 
throughout that the estimators Yi{t) are indeed unbiased; we elaborate on this in the definitions of the 
models below. 

We shall (mostly) restrict ourselves to the case where the agents use only their neighbors' estimates from 
the previous iteration, and not from the ones before it. In these cases we write 



for some Ai{t) and Pij{t) such that = whenever j ^ di. 

We will find it convenient to express such quantities in matrix form. To that end we let m(i), y{t), d{t) E 
R" be column vectors with entries Mi{t),Yi{t), Di{t), and let A(i),P(i),T € M"""" be the weight matrices, 
with A{t) = Diag(Ai(<), . . . , A„(i)), P = (Py )ij and T = Var [d{t)] = I)ia.g{T^, r^). Using this notation 
Eq. (3) becomes 



S{t) = S{t-l)+X{t-l), 



(1) 



M,{t) ^ S{t) + D,{t) 



(2) 




(3) 



y{t) ^ A{t)ina{t) +P{t)yit - 1). 



(4) 



4 



We will also make use of the covariance matrix 

C(t)=Var[y(t)-15(i)], (5) 

where 1 G M" denotes the column vector of all ones. Hence Cij{t) = Gov [Yi{t) — S{t),Yj{t) — S{t)], which 
we refer to as the "covariance of the estimators Yi (t) and Yj (t) . " 

3.1 Dynamics models 

3.1.1 Best response 

The main model we study is the best response dynamics. Here we assume that at round t, each agent i 
has access to AIi{t), y{t — 1) and the covariance matrix for these values. At each iteration t, agent i picks 
Ai{t) and {Pij{t)}j that minimize Cuit) = Nd,x\Yi{t) — S{t)], under the constraints that (a) Pij{t) may be 
non-zero only if j £ di, and (b) E [li(i)|S'(t)] = S{t), i.e. Yi{t) is an unbiased estimator of S{t). In Section 6.1 
we show that these minimizing coefficients are a deterministic function of C{t — 1), a and {t^}. Hence we 
assume here that the agents know these values. By this definition Yi(t) is the minimum variance unbiased 
linear estimator (MVULE) of S{t), given Miit) and y{t — 1). 

Note that it follows from our definitions that if the estimators {Yi{t—\)} at time t—1 are unbiased then, 
in order for the estimators at time t to be unbiased, it must be the case that 

3 

Since at time zero the estimators are unbiased then it follows by induction that Eq. (6) hold for all i > 0. 

3.1.2 Fixed response 

We shall also consider the case of estimators which are fixed linear combinations of the agent's new measure- 
ment Mi(t) and its neighbors' estimators at time t—\. These we call fixed response estimators. In this case 
we would have, using our matrix notation: 

y(t)=Am(t)+Py(t-l). (7) 

The matrices A and P are arbitrary matrices that satisfy the following conditions: (a) Pij is positive and 
non-zero only if j G di, and (b) Yi{t) is a convex linear combination of Miit) and {Yj{t — Equivalently, 
A.; + Pij = 1, which is the same condition described in Equation (6). 

3.1.3 Penultimate prediction 

Finally, we consider the penultimate prediction model where each agent i can remember one value, which we 
denote Ri{t), from one round t to the next round t + 1. We assume that at round t, each agent i has access 
to Mi{t), y{t— 1), Ri{t — 1) and the covariance matrix for these values. We denote rit) = {Ri{t), . . . , i?„(t)). 

We fix Ri{0) = 0, and let Ri{t) be agent z's MVULE of S{t - 1), given Ri{t - 1) and y{t - 1) (note that 
this is in general not equal to Yi{t — 1)). Yi{t) now becomes the MVULE of S{t) given Ri{t) and Mi{t). 

3.2 Steady states and efficient learning 

We say that the system converges to a steady state C when 

lim C(t) = C. 

Assuming that agents are constrained to calculating linear combinations of their measurements and 
neighbors' estimators, the variance of the estimators Yi{t) of S{t) at time t can be bounded from below by 
the variance of Zi{t), where we define Zi{t) to be the MVULE of S{t) given the initial estimators y(0), all 
measurements up to time i — 1 {Mj{s)\i G [n], s < and Mi{t). We therefore define that a process achieves 
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perfect learning when Var [Yi(t) — S{t)] ~ Var [Zi{t) — S{t)]. Note that this definition is a natural one for 
the complete graph and should be altered for general networks, where a tighter lower bound exists. 

If an agent were to know S{t — 1) exactly at time t, then, together with Mi{t), its minimum variance 
unbiased linear estimator for S{t) would be a linear combination of just S{t — 1) and Mi{t), because of the 
Markov property of S{t). In this case it is easy to show (see Proposition 2) that Cii{t) = Var [Yi{t) — S{t)] 
would equal a^rl ji^a^ + t1\ We say that a model achieves socially asymptotic learning if for n sufficiently 
large, as the number of agents tends to infinity, the steady state C exists and Cu tends to a^rf /{a'^ + rf) 
for all i. We stress that this definition only makes sense in models where the number of agents n grows to 
infinity and therefore is incomparable to perfect learning, which is defined for a particular graph. 

4 Statement of the main results 

The following arc our main results. Let f3{t) — l/(l^C(i)^^l). 

Theorem 8. When G is a complete graph, best-response dynamics converge to a unique steady- state, for all 
starting estimators y(0) and all choices of parameters {Ti\ and a. Moreover, the convergence is fast, in the 
sense that — log|/3(t) — /3*| ~ 0{t), where (3* ~ limf^oo 

Theorem 11. In fixed response dynamics, if Ai > for all i € [n] then system converges to a steady state 
C = limt_).oo C{t) such that 

C = A^T + cr^Pll^P^ +PCP^. (8) 

In particular, C is independent of the starting estimators y(0). 

Theorem 13. Let G be a graph with [n] vertices. Fix a and {7i}jg[„]. 

Consider best response dynamics for n agents on G with a and {Ti}ig[„] . Let Cfi"^ denote the steady state 
the system converges to. 

Consider fixed response dynamics with some P and A for n agents on G with a and {Ti\i^[n]- Let C^^ 
denote the steady state the system converges to. 

Then there exists a choice of n, G, a, {r^}, A and P such that Cf[ > Cl[ for all i E [n]. 

Theorem 16. If cr,T > 0, no fixed response dynamics can achieve socially asymptotic learning. 

Theorem 17. Penultimate prediction on the complete graph achieves perfect learning. 

5 Background results 

5.1 Time evolution of the covariance matrix 

We commence by proving a preliminary proposition on the relation between the coefficients matrices P{t) 
and A(t), and the covariance matrix C(t) in the best response and fixed response models. This result does 
not depend on how P{t) and A{t) are calculated, and therefore applies to both models. 

First, let us calculate the covariance matrix directly. By the definition of C{t) and by Eq. (4) we have 
that 

C{t) = Var [y{t) - lS{t)] ^ Var [A(t)m(t) + P{t)y{t - 1) - lS{t)] . 

Since S{t) = S{t - 1) + X{t - 1) then we can write 

C{t) = Var [A(t)(m(i) - 15(t)) + P(t)y(t - 1) - (I - A(t))l(5(t - 1) + X{t - 1))] . 

Since the estimators {Yi{t)} are unbiased then P{t)l = (I — A(i))l; see the definitions of the models in 
Section 3.1, and in particular Eq. (6). Hence 

C{t) = A{t)TA{t)^ + Var [P{t){y{t - 1) - lS{t - 1))] + Var [P{t)lX{t - 1)] , 

since Var [m(t) — iS{t)] — Var [d(<)] = T. Finally, since Var [y{t — 1)] = C{t — 1) wc can write 

C{t) = A{tfT + P{t)C{t - l)P(t)^ + CT^P(t)ll^P(t)^. (9) 
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Proposition 1. Let Q{r,t) = nl=r+i andW(t) = A{t)^T + a^F{t)ll'^P{ty with W(0) = C(0). 
Then for all t > 1, 



(10) 



r=0 



Proof. First note that equation (9) becomes simply C(t) ^ W(t) + P{t)C{t - l)P(i)^. The base of the 
induction t = 1 is now immediate, since W(0) = C{t — 1). Now assume that it holds to time t. Then 



C(t + 1) = Wit + 1) + Pit + l)C(t)P(i + 1)" 



W(f+1) + P(t + 1) 



^Q(r,i)W(r)Q(r,i)^ 



.r=0 



P(i+1) 



W(i + 1) + X! QC'-' * + l)W(r)Q(r, t + 1)^ 

t+1 

^Q(r,t)W(r)Q(r,i)T. 



□ 



5.2 Minimum variance unbiased linear estimator 



We show in this subsection how in general a minimum variance unbiased hncar estimator is calculated, given 
a collection of estimators with a known covariance matrix. 

Let X be a random variable and let (Zi, . . . , Z„) be random variables such that E [Zi\X] = X for all 
i € [n]. Let Cij ~ Gov [Zi — X, Zj — X] , with C being the matrix with entries Cij . 

Let M = biZi be the minimum variance unbiased linear estimator of X, i.e., let (6i, . . . , 6„) minimize 
Var [M — X] under the constraint that E = X, which is equivalent to J^i h = ^, since E [Z^jX] = X 

for all i. 

Denote b = . . . , 6„). 



Proposition 2. 



Proof. By definition 



l^C il' 



(11) 



Var [M -X]= Var 
Since bi = 1 then we can write 



J2b^Z,-^ 



= Cc 



J2bA-X,J2bjZ,-X 



Var [i\f - X] = Cc 



J2b^iZ,-X),J2b,{Z,-X) 



and then by the bilinearity of covariance we have that 

Var [M -X]^Y^ b,bjCov [Zi - X, Z.j -X\ = b^Cb. 



Note that we again used here the fact that ^^bi = 1. 
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To find b that minimizes this expression under the constraint that 6,; = 1 we use Lagrange multipliers 
to minimize 

b^Cb + A(l^b- 1), 

which is a straightforward calculation yielding Eq. (11). □ 

We assumed in this proof that C is an invertible matrix. When C is not invertible then it is easy 
to show that the same statement holds, with being a pseudo-inverse of C. While for different such 
pseudo-inverses one gets different values of b, the variance of the different M's is identical. 

The following two corollaries follow directly from Proposition 2. 

Corollary 3. If C is a diagonal matrix, then bi, the weight given to each variable Zi, in the minimum 
variance unbiased estimator is proportional to Var[Zi — X] ^ and the variance of the minimal variance 
unbiased estimator is l/(X]i Var[Zi — X] ). 

Corollary 4. // 

C = 

then the minimum variance unbiased estimator is 
with 

Var [M -X] = 



'1"2 



Note that in the best response model Yiit) is the minimum variance unbiased linear estimator of S{t) 
given Mi{t) and {Yj{t — 1)}. Hence to calculate it is suffices to know the covariances of these estimators. It 
follows from the definitions that 

Gov [Y,{t - 1) - S{t), Yk{t - 1) - S{t)] = C,u{t - 1) + a^ 
Gov [Y^{t - 1) - S{t), AUt) - S{t)] = +a'-+ rf, 



and 



GoY[M,{t)-S{t),M,{t)-S{t)]^ 



Thus knowing C{t — 1), a and {tj} is sufficient to calculate the coefficients Ai{t) and {Pij{tY\ in the best 
response model. 

5.3 Best response with a single agent 

We provide the following proposition without proof. It is a consequence of basic Kalman filter theory [19]; 
it is shown there that, for n = 1, the MVULE of S{t) given all the measurements up to time t is identical to 
the MVULE of S{t) given the new measurement at time t and the previous estimator. Formally: 

Proposition 5. Best response achieves perfect learning when n ~ 1. 



6 Best response dynamics 

Recall that in best response dynamics at time t > agent i chooses Ai{t) and {Pij{ty}j that minimize Cii{t), 
under the constraints that P,j{t) = if j ^ di, and Ai{t) + J2j Pij{t) = 1- Thus Y,{t) is in fact the MVULE 
(see definition in Section 3.1.1) of S{t), given Miit) and {Yj{t — l)}jgai. 

Note that to calculate Ai{t) and {Pij{ty}j it is necessary (and, in fact, sufficient, as we note in Section 5.2) 
to know C(t— 1), a and {tj}, and so this model indeed assumes that the agents know the covariance matrix 
of their neighbors' estimators. 
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6.1 Understanding best-response dynamics 

The condition that estimators are unbiased, or Ai{t) + J^j Piji^) = li means that given {Pij{t)}j one can 
calculate Ai{t), or alternatively given P one can calculate A. Hence, fixing a and {tj}, P{t) is a deterministic 
function of C{t— 1). Since by Eq. (9) C{t) is a function of A{t), P(i) and C{t— 1), then under best response 
dynamics, C{t) is in fact a function of C(t— 1). We will denote this function by F, so that C{t) = F(C(t — 1)). 
Our goal is to understand this map F, and in particular to determine its limiting behavior. 

We next analyze in more detail the best response calculation for agent i. This can conceptually be divided 
into two stages: calculating a best estimator for S{t) from y{t — 1), and then combining that with Mi{t) for 
a new estimator of S{t). 

Let the vector ydi{t — 1) = {Yj{t — E di} and let Ci{t — 1) = Coi.di{t ^ 1) be the covariance matrix 
of the estimators of the neighbors of agent i. 

Denote by qi{t) the vector of coefficients for ydi{t — 1) that make Zi ~ qi(i)^y9i(i — 1) a minimum 
variance unbiased linear estimator for S(t)] note that this is also the estimator for S{t — 1). Then by 
Proposition 2 we have that 

q^W = A(i-l)l^C,(i-l)-i, 

where Pi{t - 1) = l/l^Cj(< - It is easy to see that Var [Z^ - S{t- 1)] = Pi{t - 1) and thus 

Va.T[Z,-S{t)]^p,{t^l) + a\ 

Mi{t) is an independent estimator of S{t) with variance . To combine it optimally with Zi we set 

by Corollary 4. The optimal weight vector Pi(t) for agent i (i.e., {Pij}j,zQi) is therefore Pi(t) = (1 — 

6.2 Complete graph case 

When G is the complete graph, the agents best-respond similarly, since they all observe the same set of 
estimators from the previous iteration. We now have Ci(t— 1) = C(i— l),qi(<) = q(t), and /3i(i— 1) = 1), 
for all i. For the moment, we will suppress the t. Letting a be the vector with coefficients Ai, we then have 
P = (1 - a)q^ = ^(1 - a)l^C-i. Using this form for P, wc can now see that PCP^ = /3(1 - a)(l - a)^. 
Putting this all together, and adding back the t, we have by Eq. (9) that 

C{t) = A{tf T + {P{t - 1) + - a{t)){\ - a(t))^. (13) 

Since by equation (12), Ai{t) depends only on (3{t—l), Ti, and cr, we see that C{t) = F{C{t—l)) depends 
on C{t - 1) only through /3{t - 1) = l/l^C(i - Hence we can write C{t) = C(/3(i - 1)). We now see 

that wc can completely describe the state of the system by a single parameter /3, and our map F reduces to 
the map / : /? 1/1^C(/3)~^1. We wish to analyze this function / as a single-parameter discrete dynamical 
system. 

To simplify our formula for /, wc will make use of a matrix identity attributed to Woodbury and others 
(cf. [16]): 

Theorem 6 (Sherman-Morrison- Woodbury formula). For any U e W'""", V e M'^''", and nonsingular 
X e M"^", Y e M'^^'^ such that Y-^ + VX'^U is nonsmgular, 

{X + uYvy^ = x-^ - x-^u (r~^ + vx-^uy^ vx-^. (i4) 

Using the formula in Eq. (14), we can expresses / in terms of (3: 
Lemma 7. Let y = rf and z = (^^ '''D^^i '''r^)- Then f{(3) has the following form: 

f(B) = + jy + P + a^) 

•'^^^ y{y-in-2)niP + a2)) + z{P + a^){y + P + a^) ^ ' 
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Proof. Let x = {(3 + ct^) for brevity. We will compute l^C(/3) ^1 by applying the matrix identity (14) to 
equation (13), with A; = 1, X = T, F = /ixi, and = V = ^/x{l - a). This gives us: 



^-1. _.T/ .-2^-1 A-2T-i(l-a)(l-a)^A- 



1 ' C{I3)-'1 = 1 ' A-^T-^ -X 



1 + 2;(1 - a)TA-2T-i(l 

\ 2 



l + a;(l-a)TA-2T-i 1-a) ^ ' 



We have the identities 



I'T T-\l - a) = nx + yx'^, 

(l-a)TA-2T-i(l-a) = ya;^ (17) 
from the expression (12) for a. These identities allow us to simplify equation (16): 

l^C(/3)-il = yx' +2nx+-- i!!±M!. 

y X + y 

Finally, setting /(/3) = 1/1^C(/3)^^1 and simplifying gives us the result. □ 

We are now ready to prove the main theorem of this section. 

Theorem 8. When G is a complete graph, best-response dynamics converge to a unique steady-state, for all 
starting estimators y(0) and all choices of parameters {Ti\ and a. Moreover, the convergence is fast, in the 
sense that — log \f3{t) — /3*| 0{t), where (3* ~ limf^oo f^it). 

Proof. We will make use of the Banach fixed point theorem [7] which states that if there exists some k < 1 
such that |/'(/3)| < k for all /?, then there is a unique fixed point /?* of/, and iterates of / satisfy |/*(/3)— /3*| < 
T3fcl/3 — /(/3)| for all starting points /3. Thus, given this theorem, we need only show |/'(/3)| < k for some 
k<l. 

First note that the n = 1 case reduces to a Kalman filter, which we review briefly in Section 5.3. 

Wc will find it convenient to think of horizontal shift g{x) = f{x — a-^), where x — /3 + cr^, but allow 
any a; > 0. That is, g can be thought of as taking the variance x of the estimate of the process this 
round using only estimates from last round. We first compute g and its first two derivatives; letting D = 
y{y — {n — 2)nx) + xz{x + y), we have 

g{x)=xy{x + y)/D (18) 
g\x)=y\y^in-2)x){nx + y)/D^ (19) 
g"{x) = 2y^ {{n - 2)nx^z + y^ {{n - 1)^ - z) - 3x^yz - Sxy^z) / (20) 

Before proving bounds on /', we prove a few useful observations: 



A. z > . This follows from the Cauchy-Schwarz inequality, since '^.^TiT^ ^ 



n. 



B. D > 0. Expanding D, we have three strictly positive terms 2nxy + j/^ + x'^z plus xyz — n?xy, which is 
non-negative by observation A. 

C. g'{x) < <;==> y < (n — 2)x. Since -D > by observation B, this follows from (19). 

Upper bound: Wc will show that there exists some kjj such that /'(/?) < kjj < 1 for all /3 > by 
showing g'{x) < kjj for all x > . First, note that g'(fS) = 1. Next, by observation C and some simple 
algebra, one can show that g" [x) < as long as g'(x) > 0; that is, g' is strictly decreasing while positive, 
until X — {n — 2)/y. Hence, letting kjj :— /'(O), we have kjj = g'{(T^) < ff'(O) = 1. Finally, we have 
/'(/3) < /'(O) = kjj for all /? > 0, so kjj is our upper bound. 

Lower bound: To show the lower bound, we minimize g' with respect to x as well as all the parameters. 
Let us start with z. By observation C, we know that the minimum value of g'{x) must occur when x > 
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(n — 2)/y, the region where g'{x) < (note that ii n ^ 2 we get a trivial lower bound of 0, so henceforth 
we will assume n > 2). In this region, it is clear from observations A and B that the minimum of g' with 
respect to z occurs when z = n'^. Substituting and simplifying, we have 

9 [x) > -, ■ — := h{n, x, y). 

(nx + 

We next minimize over x: solving -^h{n,x,y) = for x yields x = f^nd one can check that indeed 

■^h{n,x,y) > for this x. Substituting again, we are now left with 

,/ N (71-2)3 

9 {x) 



27n(n- 1)2 



which is now only a function of n. One can now easily see that 5' > — 57 for all n, x, y, z. 

We have therefore shown that |/'(/3)| < fc := max(/'(0), 1/27) < 1 for all /3 and all parameter values. □ 

As a concluding comment we analyze the steady-state /3*. From the form of /, one can show that /3* 
satisfies: 

= ya^ {y + a^) - {y{-{n - 2)n + z - 2) + za^) /3 

~{{n-lfy-z{y + 2<j^))p'~z/3' (21) 

As a corollary of Theorem 8, this cubic polynomial has a unique positive root. 



7 Fixed response dynamics 

Recall that in fixed response dynamics each agent i, at each round <, has access to its neighbors' estimators 
from the previous round, {Yj{t — G di}, as well as its current measurement Mi{t). The new estimates 
are y{t) = A • m(<) + P • y{t — 1), i.e., fixed convex linear combinations of these values. 

We first show the system converges to a steady state: as t tends to infinity the covariance matrix 
C{t) = Var [y{t) — lS{t)] tends to some matrix C. 

Note that as in Section 6.2 above, a result of the convexity condition is that the choice of P uniquely 
determines A. 

7.1 Convergence of fixed response dynamics 

To prove our theorem we shall need the following lemma, as well as an easy corollary of Proposition 1. 
Lemma 9. Let P, e E'J:''" satisfy \\Pt\\oo < •y for all 1 < i < t. Then for any Q e M.^''" 

<nj^'\\Q\\oo 



Proof. This follows from two facts about the infinity norm. First, || ■ ||oo is submultiplicative, meaning for 
ah A,B e M'""", ||AB||oo < ||A||oo||S||oo- Second, for all A we have ||A||oo < n||A^||oo- These two combined 
give us 



< 7* and 



< n 



< 



n7 , 



and the result then follows from submultiplicativity. 



□ 
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Corollary 10. 



C{t) = P*C(0)P^* + J2 + c^^^'ll^P^)?^''. 



(22) 



The main theorem of this subsection is the following. 



Theorem 11. In fixed response dynamics, if Ai > for all i G [n] then system converges to a steady state 
C = limt_).oo C(i) such that 



C = A^T + (T^Pll ' P + PCP 

In particular, C is independent of the starting estimators y(0). 



(23) 



Proof. Let 7=1 — maxiAi < 1. Then since the entries of P are non- negative, the absolute row sums of P 
are less than 7, so we have ||P||oo ^ 7- Letting Z = A-^ + cr^Pll^P^, we have by Lemma 9 that 



||c(i + i)-c(t)iu = 



t-1 



J2 P'^P^'' + P*+^C(0)PT - P'-ZP^'' - P*C(0)P^ 
P*(Z + PC(0)P^ - C(0))P*^| 



r=0 



<n72*||Z + PC(0)P^-C(0)||oo. 
Thus since 7 < 1, we have limt_j.oo ||C(< + 1) — C(i)||oo = 0. Moreover, for all t we have 



t-i 



^P'^ZP^%P*C(0)P^ 



r=0 



< 



< n 



Z|loo + ||C(0)||o,)^n7^ 
l|Z||co + ||C(0)| 



2r 



r=0 



1-72 



< 00, 



SO clearly limt_i.oo ||C(t)||oo < 00. Thus, limt_).oo C(i) exists, and (23) follows from the recurrence in equa- 
tion (9). 

To see that the choice of C(0) is immaterial, consider the alternate sequence C{t) resulting from another 
choice C(0) (but the same P). By definition, 

t-i 

C{t) = J2 P'ZP^'' + P*C(0)PT*. 

r=0 

By Lemma 9 we have that 

|jC(t) - C(i)||oo = ||P*(C(0) - C(0))P^*||o, < n72*||C(0) - C(0)||oo, 
so we clearly have limj^oo ||C(t) — C(i)||oo = 0. Thus, limt^oo C(i) = limt^oo C(t) = C. □ 

7.2 Non-optimality of best response steady-state 

By Theorem 8 best response dynamics converges to a unique steady state. The next result shows that 
although, in best responding, agents minimize the variance of their estimators, in some cases they can 
converge to a steady state with lower variances by an appropriate choice of a fixed response. Le., by 
cooperating the agents can achieve better results than by greedily choosing the short-term minimum. 

We consider the case of the complete graph over n players where the agents measurement errors , are 
the same and equal r = 1, and also the standard deviation of the state's random walk a ~ I. 

Before proving the main theorem of this subsection we establish the following lemma. 
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Lemma 12. Let C(„) be the steady state 0/ fixed response dynamics on the complete graph with n agents, 
Ti = a ^ I, Ai — a for all i and Pij = (1 — a)/n for all i and j . Then 

2 (l-a)^(l + aVn) 
(2^ • 

Proof Let fS'^ = l/l^C-i^l, and let Z„(t) = i ^.^ Y,{t), so that 

lim Var[Z„(t)-5(t)]. 

>oo 

By the symmetry of the model we have that yi(t + 1) = (1 — a)Zn{t) + aMi{t + 1), and so 

Zn{t + 1) = (1 - a)Zn{t) + a- V AMt + 1). 

n ^ — ' 

i 

Therefore 

- Var [Z„(f + 1) - S{t + 1)] = + (1 - + a^), 

which, since t = a = 1, implies by simple manipulation that 

^ aVn+(l-Q^) 
1 - (1 - a2) • 

Finally, because Yi{t) = aMi{t) + (1 — a)Zn{t — 1) then 

C,,(t) = lim Var [y,(t) - S{t)] = + (1 - + 1) 

t—^oo 

2 I (l-o)"(l + aVn) 
" (2-a)a 

Theorem 13. ie< G he a graph with [n\ vertices. Fix a and {Ti}ie[n]- 

Consider best response dynamics for n agents on G with a and {Tj}jg[„] . Let C*"' denote the steady state 
the system converges to. 

Consider fixed response dynamics with some P and A for n agents on G with a and {Ti}jg[„]. Let C^"^ 
denote the steady state the system converges to. 

Then there exists a choice of n, G, a, {r^}, A and P such that Cf[ > Cl[ for all i E [n]. 

Proof. Let n = 2, let G be the complete graph on two vertices, let cr = 1 and let = 1 for all i E [n]. Let 
Ai ~ a for all i and Pij = (1 — a)/n for all i and j. Then by Lemma 12 we have that 



□ 



G 



fr 2 



(l-«)^(l + aV2) 
(2 - a)a 



for all i. 

By Ec^. (21) we have that 



= 2-V2^ 0.58578, 



for all i. 

It is easy to verify that for a = 0.60352, for example (which is in fact the minimum), it holds that 
C// « 0.58472 and so > C{[ . □ 

We note that the choice n — 2 was made to make the proof above simple, rather than being a pathological 
example; we now show that the same holds for n large enough. 
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Lemma 14. Let C^*"'^") be the steady state o/best response dynamics on the complete graph with n agents 
and Tj — a ^ 1. Then 



lim C^l'^ = ^ ~^ where /3 = |-\/7cos fi tan ^(3\/3) 



We omit the proof and mention that it follows directly by substitution into, and solution of, the cubic 
polynomial of Eq. (21), and Eq. (12). We likewise omit the proof of the following lemma, which is an 
immediate corollary of Lemma 12 above. 

Lemma 15. Let C-^^'"'" be the steady state o/ fixed response dynamics on the complete graph with n agents, 
Ti = a — 1, Ai = a for all i, and Pij = (1 — a)/n for all i and j. Then 

hm C/j ' ' =a + — - 1. 

n-i-oo a(2 — a) 

Hence setting aoc = 0.59075 (again the minimum) we get 

lim « 0.55017, 

n— ^oo 

and using Lemma 14 it is easy to numerically verify that 

lim C''"'" « 0.55496. 

Thus we have shown that for large enough n it again holds that C\l > C([ on the complete graph, for the 
appropriate choice of parameters. We conjecture that this is in fact achievable for all n > 1, with the correct 
choice of parameters. In the ease of n = 1 best response is optimal by Proposition 5. 

7.3 Socially asymptotic learning 

Recall that in the complete graph setting with fixed a and r and Ti < t for all i G [n] we say that a dynamics 

2 2 

is socially asymptotically learning if the variance of each agent's estimator approaches ^2^1-^ a-s the number 
of agents increases. 

Surprisingly, no fixed response dynamics can achieve socially asymptotic learning unless either ct = or 
T ™ 0. This, of course, includes the steady state of the best response dynamics. In the case that a = the 
value of S{t) is constant over time, and we are in the DeGroot model which is known to converge. In the 
case that r = each agent receives the exact value of S{t) in each round and can simply set Yi{t) = Mi{t) 
to asymptotically learn. 

Theorem 16. If cr,T > 0, no fixed response dynamics can achieve socially asymptotic learning. 

Proof. For the sake of contradiction, first assume that there exists some fixed response scheme that permits 

^2^2 

socially asymptotic learning so that in the limit Cu = for all i G [n\. We analyze Equation 9 which 

states that: C{t) = A{t)'^T + a^P{t)ll'^-p{ty + P{t)C{t - l)P(t)^. 

It is easy to see that each of the three terms on the right hand side of Equation 9 is a positive semidef- 
inite matrix and thus will have non-negative diagonal entries. By our assumption that we have a socially 
asymptotic learner, we know that the sum of the i,i entries in the three matrices on the right hand side of 

2 2 

Equation 9 equals in the limit. Now, (P{t)ll^P{ty),,i = J^j.kPijPik = HjPijUkPik = (1 ^ Q^O^. 

By fixing Ei so that = ""^ ^^'^ we sec that in the limit 



2 2 

= Cu < (A(i)'T),., + (a2p(t)llTp(t)T),,, + (P(t)C(t - l)P(t) 
= + (1 - a,ra^ + (P(t)C(t - l)P(t)^),, 



TV ■ 



+ + imc{t - imtfk. (25) 
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and we have that in the hmit for ah i E [n]: Si goes to 0, (P(t)C(t — l)P(<)^)i i goes to 0, and ai goes to 

We win now show that it cannot be that both ai goes to ^2j^^'i and that (P(t)C(t — l)P(t)^)i^i goes to 
0. Another apphcation of Equation 9 yields that 

(P(t)C(t - l)P(i)^),,. = (PWA(t - l)'P(t)^).M + 'T'(P(i)P(i - l)ll^P(t - l)Tp(t)^)M 

+ (PWP(t - l)C(i - 2)P(t - l)^P(t)^),,, 

and again ah the matrices on the right hand side are positive semi-definite and thus have non-negative 
diagonals. Thus: 

(P(t)C(t - l)P(t)^)M > cj\V{t)V{t - l)llTp(t - l)^P(t)^)M 

j,k,l,m \ j k J \ j 

> ^2(1 - min(l - a,f > f''\.^ > 0, 
and this is a contradiction because we saw that (P(t)C(t — l)P(t)^)i_i limited to 0. The penultimate 

2 

inequality is because Ui limits to ^2j^^-i and because ti < t for all i. □ 

8 Penultimate prediction dynamics 

In this section we consider players that can remember one value from the previous round. We show that, 
in the case of the complete graph, this allows the agents to learn substantially more efficiently than in the 
previous models. Also on the complete graph we show that this model features perfect learning. In general, 
it is not clear that the optimal strategy involving one remembered value must necessarily have this property. 

We call this model the penultimate prediction model because an agent using it is effectively trying to 
re-estimate the value of the underlying state in the previous round, disregarding its own new measurement 
from the current round. This may help discount the older information that contributed to the prediction of 
each neighbor in the previous round. 

As defined in Section 3.1, in this model each agent i does the following at time t: (a) agent i first picks 
Ai and {Pij}j that minimize Var [Ri{t) — S{t — 1)] where r{t) = A • r{t — 1) + P • y{t — 1), and (b) agent i 
then chooses ki{t) that minimizes Var [Yi{t) - S{t)] where Yi{t) = ki{t)Mi{t) + (1 - h{t))Ri{t). 



8.1 Complete graph case 

We show that the penultimate prediction model achieves perfect learning on the complete graph. This means 
that agent i learns S(t) as if she had access to every agent's measurements from all the previous rounds, 
rather than just her neighbors' estimators from the last round. 

Theorem 17. Penultimate prediction on the complete graph achieves perfect learning. 
Proof. Let 




let Qi = tI/tI and let M{t) — be the average of the measurements of the agents at time t, 

weighted by the inverse of their variance. Then M{t) is the MVULE of S{t) given m(t) (see Corollary 3). 

Let E{t) be the MVULE of S{t), given all that is known up to time t: {m(s)|.s < t}, together with y(0). 
Let y(t) = Var [E{t) ~ S{t)]. 
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Basic Kalman filter theory (see, e.g. [19]) shows that E{t) can be written as 

E{t) = (1 - K{t))E{t - 1) + K{t)M{t), (26) 

with 

Vit + l) = iVit)+a')il-Kit)) 

and K{t) = y'^^^*^^f^_^-} ■ Note that V{t) is deterministic. 

We now prove by induction that Ri{t) = E{t — 1). The base case of i = 1 follows from definitions. 

By our inductive hypothesis at step t we have that Ri{t) = E{t— 1) and Var [Ri{t) — S{t — 1)] = V{t— 1). 
Hence Ri{t) is identical for all agents and we can write R{t) = Ri{t) = E{t — 1). Because R{t) — S{t — 1) 
and S{t) — S{t — 1) arc independent we have that 

Var [R{t) - Sit)] ^ Var [R{t) - S{t - 1)] + ct^ = v{t) + cr^ 

Since R{t) - S{t) and M,{t) - S{t) are independent, Var [M,{t) - S{t)] = rf, and since Y,{t) is the MVULE 
of S{t) given R{t) and Mi{t), then by CoroUary 4, Yi{t) wiU satisfy 

Y,{t) = h{t)R{t) + (1 - h{t))M,{t), (27) 

2 

where fci(t) = v(f-i)+cr2+T^ deterministic. Thus li(<) is a deterministic function of R(t) and Mi{t), 

and more importantly Mi{t) is a deterministic linear combination of li(t) and R{t). Since i?(t + 1) is the 
MVULE of S{t) given R{t) and y(i), its variance is bounded from above by MVULE of S{t) given R{t) and 
m(<). But by Eq. 26 we have that the optimum is achieved by E{t), and so R{t + 1) = E{t). 

We have therefore estabhshed that ¥^{1) is the MVULE of S{t) given E{t-l) aivA Mi{t), where E{t-\) 
is the MVULE of 5* (t — 1) given all the measurements up to time t — 1. To complete the proof we note that 
by the Markov property of S{t) this means that Yi{t) is the MVULE of S{t) given all the measurements up 
to time i, together with Mi{t). 

□ 

Corollary 18. In the complete graph case, for any fixed a and t where Ti < t for all i G [n], the penultimate 
prediction heuristic is a socially asymptotic learner. 

Proof. By Theorem 17 we need only show that the optimal learner is socially asymptotic. Fix some agent i. 
At round t if this agent is given m{t — 1), then, by Corollary 3, he can predict S{t — 1) with Ri{t) such that 

Var [Ri{t) — S{t — 1)] = — (^J2iiE[n] ^i^^) ■ However, note that < r^/n. Agent i can then compute 

2 2 2 2 6 

Yi{t) = -^Tj^Miit) + -^f^Riit) and it is easy to see that Var[i;(t) - S(t.)] < + which 

approaches ^^^^^ as n grows. □ 

9 Conclusion 

This work can be seen as a study of natural extensions of the DeGroot model to the setting where the value 
to be learned changes over time. The most direct extension is the fixed response model. Here we show that 
while the estimate will keep moving with the true values, its variance will converge to a fixed value. However, 
in contrast to the DeGroot model, the agents are continually receiving new independent signals, and so have 
a reference point from which to evaluate the validity of their neighbors' signals. This leads us to propose the 
best response model. We show that in the case of the complete graph, best response dynamics will always 
converge to a particular fixed response that is (myopically) optimal. However, we also show that it is not 
necessarily Pareto optimal amongst all fixed responses. Finally, we show that a simple strengthening of the 
model to allow agents to remember one value can, in certain cases, lead to much improved performance. 
This can be seen not only as a critique of fixed response dynamics as being too weak to capture natural 
dynamics, but also as an interesting model to be studied more in its own right. 
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